<!DOCTYPE html><html lang="en">

<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description" content="Bioinformatics Notes">
    <meta name="author" content="Rui Wang">
 <!--作者-->
    <meta property="og:title" content="Life Film|Spring 2021">
 <!--页面标题-->
    <meta property="og:type" content="article">
    <meta property="og:site_name" content="Bioinformatics Notes">
    <meta property="article:tag" content="Bioinformatics, R Studio">
<!--页面标题-->
    <title>基因突变瀑布图</title>  
<!-- 百度统计-->
<script>
var _hmt = _hmt || [];
(function() {
  var hm = document.createElement("script");
  hm.src = "https://hm.baidu.com/hm.js?e076e26dccfe62cef6ef9a32f56a65e5";
  var s = document.getElementsByTagName("script")[0]; 
  s.parentNode.insertBefore(hm, s);
})();
</script>
 <!-- 谷歌ad-->
<script data-ad-client="ca-pub-1293855853975663" async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>

 <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.css" integrity="sha384-zB1R0rpPzHqg7Kpt0Aljp8JPLqbXI3bhnPWROx27a9N0Ll6ZP/+DiW/UqRcLbRjq" crossorigin="anonymous">
 <link rel="shortcut icon" type="image/x-icon" href="/favicon.ico">
 <link rel="stylesheet" href="/css/default.css" crossorigin="anonymous">
 <link rel="stylesheet" href="/css/blog.css" crossorigin="anonymous">
 <link rel="alternate" type="application/rss+xml" title="RSS" href="/rss.xml">	 
<!--订阅-->
</head>


  <body>
    <header class="hide-on-print">
        <div id="site-title">
            <a href="/pages/bioinfo/notelist.html">Bioinformatics Notes</a>
        </div>
        
    </header>
    <nav class="hide-on-print">
      <ul>
     <li><a href="/">About</a></li>
     <li><a href="/pages/blog.html">Blog</a></li>
	 <li><a href="/pages/vlog.html">Vlog</a></li>
     <li><a href="/files/cv.pdf">CV</a></li>
     <li><a href="/pages/tools.html">Tools</a></li>
     <li><a href="/pages/film.html">Film</a></li>
     <li><a href="/pages/24fps.html">24 FPS</a></li>
      </ul>
     </nav>

    <main>

        <article>
        <!--文章标题-->
        <h1 id="article-title">基因突变类型瀑布图</h1>
        <br>
<div>

  <h1>需求描述</h1>
<p>用R代码画出paper里的这种瀑布图。</p>
<iframe height="130px" src="https://ruiprime-pic.cn-sh2.ufileos.com/pic/20210403125145.pdf"></iframe>
<p>出自<a href='https://www.cell.com/cell/abstract/S0092-8674(17)30639-6' target='_blank' class='url'>https://www.cell.com/cell/abstract/S0092-8674(17)30639-6</a></p>
<h1>使用场景</h1>
<p>展示多个样本中多个基因的突变情况，包括但不限于SNP、indel、CNV。</p>
<p>尤其是几十上百的大样本量的情况下，一目了然，看出哪个基因在哪个人群里突变多，以及多种突变类型的分布。</p>
<ul>
<li>场景一：作为全基因组测序或全外显子组测序文章的第一个图。</li>
<li>场景二：展示感兴趣的癌症类型里，某一个通路的基因突变情况。</li>

</ul>
<p>这里用TCGAbiolinks下载数据，用maftools画图。如果要更灵活的定制，可参考FigureYa42oncoprint，用complexheatmap画图。</p>
<p>maftools功能丰富，浏览一下<a href='https://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html' target='_blank' class='url'>https://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html</a>，知道用它能画哪些图，需要时就知道过来找啦～</p>
<h1>环境设置</h1>
<p>使用国内镜像安装包</p>
<pre><code class='language-r' lang='r'>options(&quot;repos&quot;= c(CRAN=&quot;https://mirrors.tuna.tsinghua.edu.cn/CRAN/&quot;))
options(BioC_mirror=&quot;http://mirrors.ustc.edu.cn/bioc/&quot;)
BiocManager::install(&quot;TCGAbiolinks&quot;)
BiocManager::install(&quot;maftools&quot;)
</code></pre>
<p>加载包</p>
<pre><code class='language-r' lang='r'>library(TCGAbiolinks)
library(maftools)
Sys.setenv(LANGUAGE = &quot;en&quot;) #显示英文报错信息
options(stringsAsFactors = FALSE) #禁止chr转成factor
</code></pre>
<h1>输入文件下载</h1>
<p>需要下载突变信息和临床信息，作为输入数据。</p>
<pre><code class='language-r,message=false,warning=false' lang='r,message=false,warning=false'>#查看癌症项目名缩写
cancerName &lt;- data.frame(id = TCGAbiolinks:::getGDCprojects()$project_id, name = TCGAbiolinks:::getGDCprojects()$name)
head(cancerName)
</code></pre>
<h2>下载临床信息</h2>
<p>参数<code>project =</code>后面写你要看的癌症名称缩写</p>
<pre><code class='language-r,message=false' lang='r,message=false'>clinical &lt;- GDCquery(project = &quot;TCGA-LIHC&quot;, 
                  data.category = &quot;Clinical&quot;, 
                  file.type = &quot;xml&quot;)

GDCdownload(clinical)

cliquery &lt;- GDCprepare_clinic(clinical, clinical.info = &quot;patient&quot;)
colnames(cliquery)[1] &lt;- &quot;Tumor_Sample_Barcode&quot;
# 用View查看临床数据里有哪些项，不同癌症不一样，每个人感兴趣的性状也不同，根据自己的需要来选择感兴趣的列
# 理论上，每一列都可以作为分类信息画出来。
View(cliquery)

# 加载自定义分组，例如risk高低等信息，后面就可以像其他临床信息一样，按risk分组了
#group &lt;- read.table(&quot;group.txt&quot;)
#cliquery &lt;- merge(group, cliquery, by.x = &quot;sample&quot;, by.y = &quot;Tumor_Sample_Barcode&quot;)
</code></pre>
<p><img src="group.png" referrerpolicy="no-referrer"></p>
<p>这里history_hepato_carcinoma_risk_factors列有50种类型，按照例图，只提取其中的Hepatitis B、Hepatitis C和cirrhosis。</p>
<pre><code class='language-r' lang='r'>cliquery$HBV &lt;- 0
cliquery$HBV[grep(pattern = &quot;Hepatitis B&quot;,cliquery$history_hepato_carcinoma_risk_factors)] &lt;- 1

cliquery$HCV &lt;- 0
cliquery$HCV[grep(pattern = &quot;Hepatitis C&quot;,cliquery$history_hepato_carcinoma_risk_factors)] &lt;- 1

cliquery$cirrhosis &lt;- 0
cliquery$cirrhosis[grep(pattern = &quot;cirrhosis&quot;,cliquery$history_hepato_carcinoma_risk_factors)] &lt;- 1
</code></pre>
<p>后面我们将以<code>neoplasm_histologic_grade</code>，<code>gender</code>，<code>race_list</code>，<code>HCV</code>，<code>HBV</code>，<code>cirrhosis</code>作为分类画图。</p>
<h2>下载突变数据</h2>
<p>4种找mutation的pipeline任选其一：<code>muse</code>, <code>varscan2</code>, <code>somaticsniper</code>, <code>mutect2</code>，此处选mutect2。</p>
<pre><code class='language-r,message=false' lang='r,message=false'>mut &lt;- GDCquery_Maf(tumor = &quot;LIHC&quot;, pipelines = &quot;mutect2&quot;)
maf &lt;- read.maf(maf = mut, clinicalData = cliquery, isTCGA = T)
</code></pre>
<h1>开始画图</h1>
<p>oncoplot画法：<a href='https://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/oncoplots.html' target='_blank' class='url'>https://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/oncoplots.html</a></p>
<h2>展示突变频率排名前20的基因</h2>
<pre><code class='language-r' lang='r'>pdf(&quot;oncoplotTop20.pdf&quot;,width = 15,height = 10)
oncoplot(maf = maf,
         top = 20,
         clinicalFeatures = c(&quot;race_list&quot;, &quot;neoplasm_histologic_grade&quot;, &quot;gender&quot;, &quot;HCV&quot;, &quot;HBV&quot;, &quot;cirrhosis&quot;)
         )
dev.off()
</code></pre>
<iframe height="300px" src="https://ruiprime-pic.cn-sh2.ufileos.com/pic/20210403125145.pdf" </iframe>
<h2>用默认配色方案展示一个列表里的基因</h2>
<p>把你感兴趣的基因名存到<code>easy_input.txt</code>文件里，每个基因为一行。就可以只画这几个基因，其中没发生突变的基因会被忽略。</p>
<pre><code class='language-r' lang='r'>geneName &lt;- read.table(&quot;easy_input.txt&quot;,header = F,as.is = T)
genelist &lt;- geneName$V1

pdf(&quot;oncoplotLIST.pdf&quot;,width = 12,height = 6)
oncoplot(maf = maf,
         genes = genelist,
         clinicalFeatures = c(&quot;race_list&quot;,&quot;neoplasm_histologic_grade&quot;,&quot;gender&quot;,&quot;HCV&quot;,&quot;HBV&quot;,&quot;cirrhosis&quot;)
         )
dev.off()
</code></pre>
<iframe height="300px" src="https://ruiprime-pic.cn-sh2.ufileos.com/pic/20210403125202.pdf" </iframe>
<h2>重新配色展示突变频率排名前20的基因</h2>
<p>默认的颜色不够好看，还可以改配色</p>
<pre><code class='language-r' lang='r'>#突变
col = RColorBrewer::brewer.pal(n = 10, name = &#39;Paired&#39;)
names(col) = c(&#39;Frame_Shift_Del&#39;,&#39;Missense_Mutation&#39;, &#39;Nonsense_Mutation&#39;, &#39;Frame_Shift_Ins&#39;,&#39;In_Frame_Ins&#39;, &#39;Splice_Site&#39;, &#39;In_Frame_Del&#39;,&#39;Nonstop_Mutation&#39;,&#39;Translation_Start_Site&#39;,&#39;Multi_Hit&#39;)

#人种
racecolors = RColorBrewer::brewer.pal(n = 4,name = &#39;Spectral&#39;)
names(racecolors) = c(&quot;ASIAN&quot;, &quot;WHITE&quot;, &quot;BLACK_OR_AFRICAN_AMERICAN&quot;,  &quot;AMERICAN_INDIAN_OR_ALASKA_NATIVE&quot;)

#病毒、肝硬化
factcolors = c(&quot;grey&quot;,&quot;black&quot;)
names(factcolors) = c(&quot;0&quot;,&quot;1&quot;)
annocolors = list(race_list = racecolors, 
                  HCV = factcolors, 
                  HBV = factcolors, 
                  cirrhosis = factcolors)

#画图
pdf(&quot;oncoplotTop20_col.pdf&quot;,width = 15,height = 10)
oncoplot(maf = maf,
         colors = col,#给突变配色
         annotationColor = annocolors, #给临床信息配色
         top = 20,
         clinicalFeatures = c(&quot;race_list&quot;,&quot;HCV&quot;,&quot;HBV&quot;,&quot;cirrhosis&quot;),
         writeMatrix =T)
dev.off()
</code></pre>
<iframe height="130px" src="https://ruiprime-pic.cn-sh2.ufileos.com/pic/20210403125227.pdf"></iframe>
<p>还可以按样本信息来分组。这里按人种来分组，你还可以按照其他感兴趣的特征来分组，例如是否有病毒感染、基因型、生存信息等</p>
<pre><code class='language-r' lang='r'>#png(&quot;oncoplotGroup.png&quot;,width = 1000,height = 500)
pdf(&quot;oncoplotGroup.pdf&quot;,width = 15,height = 10)
oncoplot(maf = maf,
         colors = col,#给突变配色
         annotationColor = annocolors, #给临床信息配色
         top = 20,
         clinicalFeatures = c(&quot;race_list&quot;,&quot;HCV&quot;,&quot;HBV&quot;,&quot;cirrhosis&quot;), # 分组依据排在第一位
         sortByAnnotation = TRUE,
         writeMatrix =T)
dev.off()
</code></pre>
<iframe height="130px"  src="https://ruiprime-pic.cn-sh2.ufileos.com/pic/20210403125248.pdf" ></iframe>
<pre><code class='language-r' lang='r'>sessionInfo()
</code></pre>
	

          
          </div>
<div class="hide-on-print">
    <div class="info">Posted by <a href="https://blog.csdn.net/qq_33592585">Ricky</a> </div>
    <div class="info">Tags: R, Bioinformatics Notes</div>
</div>


      </article>
    </main>
   <footer class="hide-on-print">© 2016 - <time><script>document.write(new Date().getFullYear())</script></time> <a href="/" rel="author">Rui Wang</a>.
    <a href="/README.html">Blog README</a>.</footer>

</body>

</html>
